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ABSTRACT 

Galaxy clustering and cosmic magnification can be used to estimate the dark matter power 
■ spectrum if the theoretical relation between the distribution of galaxies and the distribution of 

dark matter is precisely known. In the present work we study the statistics of haloes, which in 
the halo model determines the distribution of galaxies. Haloes are known to be biased tracer 
of dark matter, and at large scales it is usually assume d there is no int r insic stochasticity 
between the two fields (i.e., r — 1). Following the work of lSeliak & Warrenl ( l2004l) . we explore 
I how correct this assumption is and, moving a step further, we try to qualify the nature of 

stochasticity. We use Principal Component Analysis applied to the outputs of a cosmological 
+3 \ N-body simulation as a function of mass to: (1) explore the behaviour of stochasticity in the 

correlation between haloes of different masses; (2) explore the behaviour of stochasticity in 
the correlation between haloes and dark matter. We show results obtained using a catalogue 
with 2.1 million haloes, from a PMFAST simulation with box size of lOOO/i-'Mpc and with 
about 4 billion particles. 

In the relation between different populations of haloes we find tha t stochasticity is not- 
neglig ible even at large scales. In agreement with the conclusions of iTegmark & Bromley! 
( 199^) who studied the correlations of different galaxy populations, we found that the shot- 
noise subtracted stochasticity is qualitatively different from 'enhanced' shot noise and, specif- 
ically, it is dominated by a single stochastic eigenvalue. We call this the 'minimally stochastic' 
scenario, as opposed to shot noise which is 'maximally stochastic' . In the correlation between 
haloes and dark matter, we find that stochasticity is minimized, as expected, near the dark 
matter peak (k ~ 0.02 h Mpc -1 for a ACDM cosmology), and, even at large scales, it is of 
the order of 15 per cent above the shot noise. Moreover, we find that the reconstruction of the 
dark matter distribution is improved when we use eigenvectors as tracers of the bias, but still 
the reconstruction is not perfect, due to stochasticity. 



o 



5— I ' Key words: methods: A^-body simulation - methods: statistical - galaxies: haloes - galaxies: 



statistics - cosmology: dark matter. 



1 INTRODUCTION 

The observational determination of the dark matter distribution is 
important not only to constrain cosmological parameters, but also 
to understand galaxy formation and the relation between the dark 
matter and the galaxy distributions. The dark matter distribution 
can either be estimated indirectly through the study of the galaxy 
distribution, or directly through weak gravitational lensing. Using 
the fir st approach, many g alaxy redshift su rveys, such as 2 dFGRS 
(e.g. JPeacock et al.ll200lB and SDSS (e.g.. ITegmark et alj|2004bh . 
have mapped the three-dimensional distribution of around a million 
galaxies to determine the real-space power spectra P(k) of the mat- 
ter fluctuations. Results from these surveys, together with the mea- 
surements of the C MB by WMAP, favour a flat, dark-energy dom- 
inated cosmology dSpergel et al]|2003l ; ITegmark et aHl2004ah . Of 
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course, when using maps of galaxies to determine the dark matter 
distribution, one has to take into account that galaxies are a biased 
tracer of dark matter, that the bias depends on galaxy properties and 
it can be scale-dependent and stochastic. 

ISeliak& Warred Eooi hereafter SW04) proposed to use faint 
galaxies as dark matter tracer, since these are expected to occupy 
low mass haloes, which hav e a large scale bias approximately inde - 
pendent of halo mass (e.g jMo & Whitdll996l ; lsheth et al.lfioOlT) . 
[Per] (2004) suggested to get a dark matter three-dimensional map 
and power spectrum using galaxy tomography, i.e., combining pro- 
jected weak-lensing with the cross-correlation between galaxies 
with distance information (from galaxy surveys). Galaxy-mass cor- 
relation can also be measured using cosmic magnification, which 
is the magnificatio n of background sourc es due to the foreground 
matter distribution. IScranton et al.l J2005h detected a cosmic mag- 
nificatio n signal corre l ating foreg round galaxies with background 
quasars. Izhang & Pent ( 2005 , 2006) proposed to study cosmic mag- 
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nification using 21 cm emitting galaxies. All these approaches as- 
sume a perfect correlation (r = 1) between galaxies and dark matter 
and between galaxies of different luminosities. We stress here that 
with the term stochasticity we refer to the scatter in the correla- 
tion between two fields (we refer to Section [2] for a more rigorous 
definition), without implication on the deterministic nature of the 
universe. This is analogous to the quantum mechanical density ma- 
trix. In analogy to the density matrix, stochasticity corresponds to 
a mixed state. When there are multiple bins, for example in mass, 
deterministic means a pure state, which is determined by a single 
state vector. Being not deterministic opens the dimensionality of 
the problem, and this paper quan tifies this e f fect. 

On the observational side, IWild et alj J200lh found stochas- 
ticity between different galaxy populations, both when defined by 
colour or spectral type. On the theoretical side, SW04 found that 
stochasticity is not negligible both between haloes and dark mat- 
ter and between haloes of different masses. Following SW04, we 
explore in more details the relative bias and stochasticity between 
haloes populations, with the goal of estimating the relative im- 
portance of this scatter. We use here a method, based on Prin- 
cipal Component Analysis (PCA) , to isolate the stoc hastic sig- 
nal. The same method was used bv lTegmark & Bromlevl (hereafter 
TB99 ll999t) to study stochasticity between different galaxy popu- 
lations in the Las Campanas Redshift Survey. This method is here 
tested using halo catalogue s from N-body s imulations performed 
using the PMFAST code jMerz et alj feoOS), where we assumed 
that higher-mass haloes correspond to higher-luminosity galaxies, 
as j ustified by the tight relation between the halo mass and luminos- 
aM2 



itv jGuzik & SeliaM2002l;|Hoekstra et al]2005l : lM~andelbaum etail 
2006 ; van den Bosch et alj2007n T We compare the stochastic signal 
with the shot noise, which is a well understood stochastic field, and 
stress the fundamental difference of our findings with a shot noise 
model. 

In Section 2 we give a brief review of the definitions of the 
parameters used in this paper. In Section 3 we describe the simu- 
lations and the generation of the halo catalogue. In Section 4 are 
presented the results of the correlation between haloes of different 
mass. The bias and the stochasticity between haloes and dark matter 
are described in Section 5. Finally we summarize our conclusions 
in Section 6. 



2 BIAS AND STOCHASTICITY 

Observationally it has long been understood that galaxies of dif- 
ferent masses and colours have different cl ustering properties 
(e.g., the recent wo r ks of:lNorberg eT al. 2002; Zeha vi et alj 2005; 
iMeneux et ai]l2006l ; IWang et al.ll2007l ; ISwanson etai]|2008l) . Sim- 
ilarly, it has been understood theoretically that haloes of different 
mass are mutually biased. In recent years, this has been generalized 
to allow for stochasticity in addition to bias: a better quantitative 
model of biasing, stochasticity, allows a more accurate reconstruc- 
tion of cosmological parameters and, even more importantly, a be t- 
ter understanding of errors (e.g.. |Penlll998l ; lDekel & Lahavf l999). 

These concepts have been primarily discussed in the context 
of two populations. In this paper, we will generalize this to a contin- 
uum distribution of populations, or bins. The generalization of bias 
might appear straightforward, but a consistent and optimal mea- 
sure must be introduced. Stochasticity is even more complex, and 
potentially an open ended statistical description. 

Stochasticity is a small effect, which describes a lack of co- 
herence between populations. In a Principal Component Analy- 



sis framework there are different possible outcomes: 1. one might 
find that a single parameter (eigenvector) describes the apparent 
stochasticity between all pairs of populations. We call this 'min- 
imally stochastic', since a single second parameters accounts for 
most of the stochasticity. Or 2. the stochasticity might be pluralis- 
tic, and might not be captured in one or a small number of com- 
ponents. An example of the latter is shot noise, which is pairwise 
uncorrelated and cannot be described by a single coherent compo- 
nent. We call this 'maximally stochastic', where the number of hid- 
den variables is equal to the number of bins. The present work uses 
numerical simulations to quantify stochasticity above and beyond 

shot noise statistics. 

Following the notation of Selia k"&"Warrenl d2004h we define 
the bias between two populations (e.g., haloes and dark matter), as 
the ratio of the power spectrum of the two density fields: 
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where 8/, and br>M are respectively the density fluctuation of haloes 
and dark matter and the average is done over the modes. 

We also define the cross-correlation coefficient r, which quan- 
tifies the stochasticity between two density fields. In the case of 
haloes and dark matter, it can be expressed as: 
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where — 1 ^ r ^ 1 . If r = 1 there is no stochasticity, and the distri- 
bution of dark matter can be derived from that of haloes, once the 
bias is known. 

Bias and cross-correlation coefficient are related through the 
quantity a/,, the relative rms fluctuations in b, defined as: 

'StY = (fib - b &DM) 2 ) 

b 
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From equations ([2} and l[3}, we have the stochastic scatter: 

y/2{\-r)=S (4) 



b 



Hereafter, when we talk about stochasticity, we will refer to this 
quantity S- In fact, even a small departure of r from 1 implies 
not-negligible rms fluctuations. Stochasticity gives the error when 
we estimate the dark matter density field from the halo distribution 
only through the bias. 

In this work we qualify stochasticity between multiple halo 
populations and between haloes and dark matter. 



3 THE SIMULATIONS 

The simulations were perf ormed using the PMFAST code, a par- 
allel, particle-mesh code dMerz et ai] |2005h . This code was run on 
the CITA itanium cluster, which has 8 nodes of 4 processors each, 
and a total of 512GB of RAM, allowing simulations with many 
particles and a large dynamic range in mass. We ran several sim- 
ulations using the standard cosmology with £1a = 0.73, h = 0.7 
and box sizes from 100/r'Mpc to lOOO/T'Mpc. The number of 
particles ranges from 160 3 to 1624 3 . In this paper we use the re- 
sults obtained with the largest simulation (= 1000/i~'Mpc box size 
and 1624 3 particles). In this simulation the wavemode k can be as 
small as k = 0.6283 x W~ 2 h Mpc~', so that large scales are well 
sampled. The particle mass is ~ 1.75 x lO 1 °/r I M . This samples 
haloes both above and below M*, representing rare and common 
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Figure 1. Mass function of the halo catalogue (dotted line) compared 
with the Press-Shechter (solid black line) and the Sheth & Tormen (gray 
long-dashed line) approximations. Because of the limited resolution in 
the simulation, we consider only haloes with a mass higher than M cllt = 
1.5 x Kl 'li 'M. (gray vertical solid line). 

haloes. Halo catalogs are de fined using the spherical overdensity 
method dCole & L acev 1996|): once the density peaks in the parti- 
cle field are identified, the overdensity in the cells around the peaks 
is calculated; haloes are then defined to be the spherical regions 
with overdensity S = 178. 

The mass function of the halo catalogs is then compared both 
with the analytical Press & Schechter jPress & Schechter 1 19741) 
and w ith the Sheth & Tormen approximation I She th & Tormenl 
1999). Because of the limited resolution in the simulation, haloes 
with too few particles have to be excluded from the catalogue. 
As shown in Fig. [T] small-mass haloes depart from the Press & 
Schechter and the Sheth & Tormen mass functions. We include in 
the halo catalogue only haloes above resolution limits, i.e., above 
M cut = 1.5 x lO 12 h _1 M . The final halo catalogue consists of 
2.1 x 10 6 haloes, with masses ranging from 1.5 x 10 ft _1 M0 to 
3.7 x 10 15 A -1 Mo. 

In order to consider the effect of the shot-noise, we also gener- 
ated a random catalogue consisting of as many particles as the num- 
ber of haloes. This is the reference shot-noise catalogue. It should 
be noted that the uniformly distributed random haloes do not obey 
exclusion: real haloes can not be spaced closer than a virial radius. 
This leads to a slight error in modelling the shot noise, which we 
neglect. 

The dimensionless power spectra of the haloes and the dark 
matter are shown in Fig. [2] together with the Poisson counting er- 
ror. At small scales (k> 1.0 h Mpc -1 ) the halo power spectrum 
strongly departs from the dark matter power spectrum. This is be- 
cause at those scales the halo power spectrum is dominated by the 
shot noise. As it will be shown in the next section, once we sub- 
tract the shot noise, the halo power spectrum will show the same 
small-scales turn-down as the dark matter power spectrum. 



4 STOCHASTICITY BETWEEN HALOES OF 
DIFFERENT MASS 

We start by studying the relation between haloes of different 
masses. The goal of this section is to use PC A to determine whether 
different halo populations are coherent or if stochasticity is present 
and, if present, to determine its behaviour. As we will see below, 
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Figure 2. Upper panel: dimensionless power spectrum of the dark matter 
(black-dotted line) and of the halo catalogue (gray-solid line). The Poisson 
error is shown with the solid-bullet line. Lower panel: ratio of the halo and 
the dark matter power spectrum. 



the eigenvalues of a shot-noise-type field are all the same ('maxi- 
mally stochastic scenario'). We want to answer the question: does 
the stochasticity between haloes of different mass behave in the 
same manner? 

4.1 Applying PCA to the halo covariance matrix 

The first step is to divide our simulated haloes into bins, sorted by 
mass. We present the results obtained by dividing the halo cata- 
logue into 6 bins, but we will show in the next sub-section that the 
results do not change if we double the number of bins. The bins 
are chosen with equal number of haloes, such that the shot noise 
properties between them are similar. 

PCA is applied to the covariance matrix rj,y between the six 
halo bins. A brief explanation of the way we apply PCA is given in 
AppendixfA] For each wavemode k, the covariance matrix is given 
by: 

oy = <8,-8;> (5) 

where the indices i = l,..,n and j = l,..,n refer to the halo bins, 
and where the average is done over wavemodes during the power 
spectrum and cross-power spectrum calculation. 

In the same way as for the haloes, we divide the random cat- 
alogue in bins, calculate the power spectrum of each bin and the 
cross-power spectrum between the bins, and we apply the same 
procedure of PCA to the covariance matrix from the random cata- 
logue. 

In the upper panel of Fig. [3] we show the eigenvalues of the 
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Figure 3. Upper panel: Eigenvalues of the halo bins covariance matrix 
(solid thick lines) and of the random bins covariance matrix (dashed thick 
Only the first eigenvalue of the halo covariance matrix (darkest solid thick 
line) is significantly higher than the others, suggesting the absence of 
stochasticity in the halo correlation. The Poisson error is the same as in 
the previous figure. Lower panel: Difference between the halo bins eigen- 
values and the corresponding eigenvalues from the random catalogue. The 
first eigenvalue (solid black thick line) is still higher than the others, but we 
notice here that also the second eigenvalue (long-dashed, dark-gray thick 
line) tends to be significantly higher than the others at k > 0.05 h Mpc~', 
suggesting that haloes are minimally stochastic. 



halo covariance matrix (solid lines) compared to the eigenvalues of 
the random covariance matrix (dotted lines). The first eigenvalue of 
the halo catalogue is significantly higher than the others, which in 
turn are very close to the eigenvalues of the random catalogue. As 
expected, the shot-noise eigenvalues are all equal to each other (the 
scatter at small-k is due to counting error). 

To test if all the eigenvalues but the first one are a conse- 
quence of shot-noise, we subtracted from each halo eigenvalue 
the corresponding eigenvalue of the random catalogue. The AA,,- = 
^■i halo — K,mndom are plotted in the lower panel of Fig. [3] In this 
case, only AX; is the dominant one. But we notice that also AXo is 
higher than the other differences. This implies that there is one ad- 
ditional, and primarily only one, source of stochasticity other than 
the Poisson noise. In other words, the detected stochasticity is not a 
shot-noise like stochasticity. This result is in agreement with what 
was found by TB99: studying the correlation between four 'clans' 
of galaxies, TB99 found a principal component which traces the 
matter, which is followed by a second eigenvalue that is signifi- 
cantly larger than the remaining two. Their result was obtained for 
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Figure 4. A 2 of the halo catalogue, both before and after random subtraction 
(respectively gray solid and gray long-dashed lines). The first eigenvalue 
is shown by the symbols, again both before and after random subtraction 
(respectively squares and circles). The random-subtracted halo power spec- 
trum is now lower than the DM power spectrum (thin dashed black line) at 
small scales, because we are neglecting here the 1-halo term. 



scales around k ~ 0.6 h Mpc -1 , which is within the scales where 
our second eigenvalue dominates. 

It's clear that this scatter in the relation between different pop- 
ulations is due to a lack of further information on the populations. 
Using PCA we can not further determine on the origin of this scat- 
ter, and this is beyond the goal of the present work. Recent studies 
have shown that bias between haloes and dark matter (and there- 
fore between different haloes populations) could depend on other 
physical properties other t han halo mass, such as halo formation 
time (e.g. iGao et alj|2005l : iGao & Whitell2007r) and concentration 
dWechsler et al.ll2006l) . In principle, one could test any possible in- 
gredient by applying PCA to a N p X N F covariance matrix, where 
N is the number of bins used and P is the number of parameters 
included. 

Finally, as a check for the fact that most of the signal is con- 
tained in the first eigenvalue, we compare A4 with the dimension- 
less power spectrum of the complete halo catalogue. As shown in 
Fig.|4]the first eigenvalue follows the halo A 2 , both in the case of 
shot-noise and non shot-noise subtraction. The random-subtracted 
halo A 2 is now lower than the dark matter A 2 at smaller scales, be- 
cause we are neglecting here the correlation of structures within 
haloes (the 1-halo term). 



4.2 Results with a different number of bins 

To test how the results described in the previous sub-section depend 
on the number of bins used, we repeated the same exercise using 12 
bins. As show in Fig(5]the outcome is the same: we find that there is 
only one stochastic component other than the shot noise, i.e., haloes 
are 'minimally stochastic' . The only difference is a slightly higher 
scatter in the eigenvalues at small wavemodes, due to the smaller 
number of objects in each bin. 
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Figure 5. Same as Fig. [5] but with a higher number of halo bins. 



5 STOCHASTICITY BETWEEN DARK MATTER AND 
HALO BINS 

In this section we first show how eigenvectors can be used to trace 
the bias. We then study the stochasticity between haloes and dark 
matter, confirming that stochasticity saturates at scales where the 
ACDM power spectrum peaks. Finally, we show how stochasticity 
can be reduced when the halo density field is weighted using the 
principal component from the previous section. 

5.1 Bias and Eigenvectors 

The bias between the haloes, random subtracted, and the dark mat- 
ter is shown in Fig(6] where the highest value of the bias is the 
one related to the bin with higher-mass haloes. In the same figure 
we also plotted the components of the principal component, i.e., 
the eigenvector corresponding to the first eigenvalue derived in the 
previous section. When multiplied by the sum of all the bias at each 
k, the square of the components of the first eigenvector follow the 
bias between the corresponding halo bin and the dark matter. This 
is straightforward to demonstrate when r = 1 . In this scenario, in 
fact, the covariance is simply given by: 

Gij = bjbj (6) 

In the two-dimensional case, for example, we have: 

aij= {bX bj ) (7) 
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Figure 6. Bias between haloes of different mass and dark matter (solid 
lines), compared to the components of the first eigenvector. The shot-noise 
has been previously subtracted. The higher biases (darker lines) correspond 
to bins with higher mass haloes. 



and it is straightforward to show that the principal component of 
this matrix is: vi = (b\,b2). The bias between different popula- 
tions can therefore be approximated by the eigenvectors of the co- 
variance matrix, at scales where r ~ 1 (see also TB99). 



5.2 Stochasticity 

We now explore in details the behaviour of stochasticity between 
haloes and dark matter. 

The cross-correlation coefficients between each halo bin and 
the dark matter is shown in Fig.|7]as a function of the wavemode k. 
In all cases, the coefficient is close to 1 at large scales (i.e., lower 
values of k), but it approaches at higher values of k, with a more 
abrupt decrease in the case of less massive haloes. 

r is indeed close to 1 at large scales, as it as been always as- 
sumed in works involving the relation between dark matter and 
haloes (or galaxies), but io order to verify how good this assumption 
is, and to determine the effect of even a small departure of r from 
unity, we consider the quantity S = v/2(l — r), which is plotted in 
Fig. [8] Even when r is closest to unity we have S — 0.2, which im- 
plies an error of ~ 20 per cent in the relation between the halo and 
the dark matter density fields. As expected, stochasticity saturates 
(S becomes flat) at k < 0.02 h Mpc -1 , which corresponds to the 
peak of the ACDM power spectrum (e.g., Dodelson et al, 1 9961) . If 
stochasticity is due to a local process, one expects its power to be 
flat at large scales. The stochasticity thus decreases with increasing 
large scale power. At large scales (k < 0.02 h Mpc -1 , the power 
spectrum drops, and one would expect a minimum in the stochas- 
ticity, which we indeed observe. 

The values of r and S are shown both before and after the 
shot-noise subtraction. Notice that, because of the subtraction of 
the noise, the value of r can become greater than 1, in which case 
\/2(l — r) can be non-real. We still show 5, considering that we 
calculate \/2(l — r) taking |1 — r\, and assigning at S the sign of 
(1 — r). Even after the shot-noise subtraction, the scatter in the bias 
saturates at large scales, and it is of the order of 15 per cent. 
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5.3 Using the principal component as weight 

The results of the previous sub-section are now shown using the 
'weighted' values of the halo power spectrum: in the calculation of 
A 2 for the halo catalogue, the particle masses are multiplied by a 
weight depending on which mass-bin they belong to. The weights 
are given by the components of the eigenvectors obtained when 
diagonalizing the covariance matrix of the halo bins. 

From statistical theory, assuming to have k independent esti- 
mates e,- of the quantity to be measured, each with the associated 
error 0",-, the best combined estimate is the weighted mean, given 
by: 



where the weights are given by w, = I /of. 

In our case the dark matter density field is given by equation 
Q plus an error a no i se : 

8 h,i= h ^ 2 DM ±a noise (9) 

where again ( refers to the halo bin considered. Dividing all quan- 
tities by the bias, and using equation l[8j, we obtain: 

yn uls.2 

5 DM = L „ b 2 0°) 
We see that in our case the weight for each bin is given by the 



square of the corresponding bias. Using the eigenvectors has advan- 
tages over dividing the power of the bins: when fine bins are used, 
the shot noise increases. Unless stochasticity dominates, more in- 
formation is used when using the eigenvector, which includes all 
cross correlations to measure the bias. 

Since the bias is not a measurable quantity in observations, we 
use as weights for each bin the corresponding components of the 
principal component. As was shown at the beginning of this sec- 
tion, bf = v| j x YJi=i when r = 1. Therefore we effectively use 
as weights the components of the first eigenvector corresponding 
to the wavemode k = 0.01885 h Mpc~', the wavemode at which 
the cross-correlation coefficient is closest to 1 . At each k there are 
6 eigenvectors, since r is not perfectly equal to 1 , and each of them 
has 6 components. We first weighted the bins using the components 
of the principal component: the mass of the particles belonging to 
the bin j have been multiplied by the jth component of vi We re- 
peated this exercise also using the components of second and third 
eigenvector. 

In Fig. |9] the cross-correlation coefficient between the entire 
halo catalogue and the dark matter is shown both in the weighted 
and non- weighted case. When weighted using the first eigenvector, 
the value of r gets closer to 1 at every k, and the increase is more 
substantial at larger wavemode. We see how the weighting done 
using the eigenvectors corresponding to the secondary components 
corresponds to low values of the cross-correlation coefficient, and 
this is because these components correspond to noise. 

Again, to see how the weighting process changes stochasticity, 
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Figure 9. Cross-correlation coefficient r between the haloes and the dark 
matter, both in the weighted and in the non-weighted case. In the upper 
panel the haloes are weighted using the first three eigenvectors, but only the 
principal component (i.e., the first eigenvector) is significant. In the lower 
panel the same result is shown after the subtraction of the noise. 



we calculate the quantity \Jl{\ — r), which is shown in Fig. [To] 
as a function of k. In this case, the minimum error in the relation 
between the entire halo population and the dark matter density field 
is ~ 18 per cent. 

We need to check if the statistical measurement error in the 
data coming from the simulation could be the cause of the departure 
of r from unity. We show that this can not be the case, and the error 
on r is small. As derived in appendix [B] Ar as a function of scale 
is given by: Ar(k) = (e 2 (k))/(8| )M (k)), where e 2 is the random 
density field. Ar is shown in Fig. QT| where the error bars have 
been centered at 0. The small values of Ar make it insignificant 
with respect to r, and even when considering possible effects of the 
error, r can not reach the unity. 

The density maps of the haloes, properly weighted using the 
first 3 eigenvectors corresponding to k = 0.01885 h Mpc -1 are 
shown in Fig.l 121 

As for the previous section, these last results do not depend on 
the number of bins used. To avoid redundancy, we omit the plots. 
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Figure 10. Same as previous figure for the quantity S = ^2(1 — r). 



6 SUMMARY AND CONCLUSIONS 

We studied the correlation between different halo populations and 
between haloes and dark matter using a PC A technique. To do so, 
we used the output of a N-body simulation with 1624 3 particles 
in a 1000/?~'Mpc box. After dividing the halo catalogue into bins 
sorted by mass, we applied PCA to the covariance matrix given by 
the cross-power spectra of the halo bins. 

Analyzing the haloes alone, we now understand the stochastic- 
ity between haloes of different mass: we have one dominant princi- 
pal component, and a second component which is small and grows 
in relative importance as one approaches the non-linear scales. This 
component dominates the apparent stochasticity. The remaining 
components are all at the level expected from random sampling. 
We call this 'minimal stochasticity', which is the opposite scenario 
from what one might expect in a shot noise model, which we called 
'maxim ally stochastic'. This result is in agreement with the conclu- 
sions of iTegmark & Bromlevl Il999h . who studied the correlation 
between different galaxy populations. 

When we consider the relation between haloes and dark mat- 
ter, we find that the highest eigenvalue is the best tracer of dark mat- 
ter. Even though this is the best that can be done, the error is still 
at least of the order of 15 per cent, even at very large scales, and at 
X, > 300/7~'Mpc stochasticity is saturated as expected: at this scale 
the dark matter power spectrum reaches its peak (for a ACDM cos- 
mology). Moreover, we show that the eigenvectors from PCA are a 
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Figure 11. Error on the cross-correlation coefficient. For visualization pur- 
poses, the error bars have been centered at zero. 



better estimate for the bias, since it takes into account all pairs of 
bins, and has less shot noise. 

We also studied the correlation of halo mass bins, and the PC A 
eigenvectors, with the total underlying dark matter. The observa- 
tion of galaxies and the measure of their power spectra can pro- 
vide useful information on the distribution of dark matter if bias 
and stochasticity are properly considered. When we use the com- 
ponents of the principal component as weights for the calculation 
of the power spectra of the haloes, the estimate of the dark matter 
power spectra improves further. 

We argue that this is a better way to properly calculate the 
galaxy power spectra and then estimate the underlying dark matter 
power spectra and to properly estimate the bias between galaxies 
of different luminosity and dark matter. 



ACKNOWLEDGMENTS 

We thank Hugh Merz for the technical support in running the simu- 
lations. S.B. thanks the Department of Astronomy & Astrophysics 
at University of Toronto and the Canadian Institute for Theoretical 
Astrophysics where most of this work has been carried out. 





. IEDEE 
EE IE I 
E IEEE 
33BEE 
5BHEE 



K53 I EE 
KH 15 EE 
I-.E33EE 
.E3HE I 
I IEEE 
EBB EE 
HBHEE 



Figure 12. Density maps of the haloes. In the first one, the haloes have been 
weighted using the principal component at k = 0.01885. The second one is 
weighted using the second eigenvector and the third maps using the third 
one. 
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APPENDIX A: PCA 

Principal Component Analysis is used to identify patterns in data, 
and to highlight how data are related to each other. The first step 
is to calculate the covariance matrix of the data set. The dimen- 
sion of the matrix n is equal to the dimension of the data set. The 
covariance matrix is defined as: 



becomes: 



{{Xi-Hi){xj-Hj)) 



(Al) 



where ( = 1, ..,n, j = 1, ..,n and p. indicates the mean of the data 
for the considered dimension. The main diagonal, i = j, is given 
by the variances of the data in that dimension, and the matrix is 
symmetric. 

PCA calculates the eigenvectors and the corresponding eigen- 
values of a covariance matrix. The eigenvector associated with the 
highest eigenvalue is the principal component of the data set, and 
it indicates the main direction along which the data are distributed. 
The smaller an eigenvalue, the smaller the 'significance' of that 
component. 

In summary, the number of significant eigenvalues indicates 
on how many directions the data are spread. A data set is consid- 
ered stochastic if more than one eigenvalue are different from zero, 
and deterministic if only one eigenvalue is not null, in which case, 
if you know one data point you know them all. In this paper we 
generalize the nature of stochasticity to depend on the nature of the 
eigenvalues. If all modes have an equal eigenvalue, as in shot noise, 
we call it 'maximal stochasticity', while a single mode which dom- 
inates the stochasticity is called 'minimal stochasticity'. 



r(k) 



((8 DM (k)+£(k))8 DM (k)) 

<8 2 M (k)>(<8 2 (k))-< e 2 (k)>) 



(B3) 



where, in the denominator, we subtract the random field from the 
halo bin. 

After some simple arithmetic we get to the final steps which 
show that, in the absence of stochasticity, the cross-correlation co- 
efficient is indeed unity: 

m = (8 2 M (k)) + (e(k)S DM (k)) = (8l M (k)} = i (B4) 



(S^(k)) 



Note that (e(k)8flM(k)) = 0, where here the average is done over 
various directions of k within a simulation. 

Now, being 1 the expectation value of r, the error on r is given 

by: 

(Ar(k)) 2 ee ((r(k)-l) 2 ) 
= {r 2 (k))-\ 

(52 M (k)(8 DM (k)+e(k))^ 



fek)) 2 



1 



(e 2 (k)) 
(SjVCk)} 

where the average is done over an ensamble of simulations. 



(B5) 



APPENDIX B: ERRORS 

Bl Spread of the error when binning 

Throughtout the paper, we often bin along the wavemode k. Quanti- 
ties which depend on k will then be averaged out within the k— bins. 
If are the rms fluctuations of the quantity x(k), from basics of 
statistics the new error Cfy x on each new average x(k new ), is given 
by: 

where k m - m and k max are the extremes of the bin centered in k new , 
and N\ hin is the number of wavemodes contained in each bin. 



B2 Errors on r and s 

We derive the error on r assuming r — 1 (No stochasticity null- 
hypothesis). The cross-correlation coefficient between haloes and 
dark matter is given by: 

r(k) = fimm (B2) 
^/<S 2 (k)}<S 2 M (k)> 

If there is no stochasticity, we have (Sjj) = (5^ M ) + (e 2 ), where £ 2 
is the random density field. The cross-correlation coefficient then 



